CoSelect: Feature Selection with Instance Selection for Social Media Data

نویسندگان

  • Jiliang Tang
  • Huan Liu
چکیده

Feature selection is widely used in preparing highdimensional data for effective data mining. Attributevalue data in traditional feature selection differs from social media data, although both can be large-scale. Social media data is inherently not independent and identically distributed (i.i.d.), but linked. Furthermore, there is a lot of noise. The quality of social media data can vary drastically. These unique properties present challenges as well as opportunities for feature selection. Motivated by these differences, we propose a novel feature selection framework, CoSelect, for social media data. In particular, CoSelect can exploit link information by applying social correlation theories, incorporate instance selection with feature selection, and select relevant instances and features simultaneously. Experimental results on real-world social media datasets demonstrate the effectiveness of our proposed framework and its potential in mining social media data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF

Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Feature Selection with Linked Data in Social Media

Feature selection is widely used in preparing highdimensional data for effective data mining. Increasingly popular social media data presents new challenges to feature selection. Social media data consists of (1) traditional high-dimensional, attribute-value data such as posts, tweets, comments, and images, and (2) linked data that describes the relationships between social media users as well ...

متن کامل

Linked Unsupervised Based Advanced Feature Selection Framework with Artificial Bee Colony for Social Media Data

The explosive usage of social media produces large amount of unlabeled and high-dimensional data. Feature selection has been proven to be effective in dealing with high-dimensional data for efficient learning and data mining. Unsupervised feature selection remains a challenging task due to the absence of label information based on which feature relevance is often assessed. Existing work investi...

متن کامل

Improving Chernoff criterion for classification by using the filled function

Linear discriminant analysis is a well-known matrix-based dimensionality reduction method. It is a supervised feature extraction method used in two-class classification problems. However, it is incapable of dealing with data in which classes have unequal covariance matrices. Taking this issue, the Chernoff distance is an appropriate criterion to measure distances between distributions. In the p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013